self-supervised depth estimator
Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or on the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to Forget About the LiDAR (FAL), with Mirrored Exponential Disparity (MED) probability volumes for the training of monocular depth estimators from stereo images. Our MED representation allows us to obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM), which does not impose a learning burden on our FAL-net.
Review for NeurIPS paper: Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
Weaknesses: I have no major concerns, but only remarks and suggestions for improvements. Although this is unambiguous in the experimental section, the abstract and introduction should clarify that the method is self-supervised from stereo pairs. There is a lot of confusion in the literature, because all monocular methods predict depth from a single image (by definition) but can be trained in different ways: from lidar supervision (full or partial), from stereo pairs (as is the case here), or from videos (a.k.a. Some of the authors' critique of related works (e.g., regarding dynamic objects) are only applicable to the SfM self-supervised scenario, as in the case of stereo-based self-supervised learning pairs of images are captured at the same time. Furthermore, the SfM case requires estimating the camera's ego-motion, which vastly complicates the self-supervised learning task (hence why the comparison is not entirely fair in my opinion).
Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or on the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to "Forget About the LiDAR" (FAL), with Mirrored Exponential Disparity (MED) probability volumes for the training of monocular depth estimators from stereo images. Our MED representation allows us to obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM), which does not impose a learning burden on our FAL-net.